16 research outputs found

    Methods for ranking user-generated text streams: a case study in blog feed retrieval

    Get PDF
    User generated content are one of the main sources of information on the Web nowadays. With the huge amount of this type of data being generated everyday, having an efficient and effective retrieval system is essential. The goal of such a retrieval system is to enable users to search through this data and retrieve documents relevant to their information needs. Among the different retrieval tasks of user generated content, retrieving and ranking streams is one of the important ones that has various applications. The goal of this task is to rank streams, as collections of documents with chronological order, in response to a user query. This is different than traditional retrieval tasks where the goal is to rank single documents and temporal properties are less important in the ranking. In this thesis we investigate the problem of ranking user-generated streams with a case study in blog feed retrieval. Blogs, like all other user generated streams, have specific properties and require new considerations in the retrieval methods. Blog feed retrieval can be defined as retrieving blogs with a recurrent interest in the topic of the given query. We define three different properties of blog feed retrieval each of which introduces new challenges in the ranking task. These properties include: 1) term mismatch in blog retrieval, 2) evolution of topics in blogs and 3) diversity of blog posts. For each of these properties, we investigate its corresponding challenges and propose solutions to overcome those challenges. We further analyze the effect of our solutions on the performance of a retrieval system. We show that taking the new properties into account for developing the retrieval system can help us to improve state of the art retrieval methods. In all the proposed methods, we specifically pay attention to temporal properties that we believe are important information in any type of streams. We show that when combined with content-based information, temporal information can be useful in different situations. Although we apply our methods to blog feed retrieval, they are mostly general methods that are applicable to similar stream ranking problems like ranking experts or ranking twitter users

    Association between anogenital distance as a noninvasive index in the diagnosis and prognosis of reproductive disorder: A systematic review

    Get PDF
    Background: There are 2 measures of anogenital distance (AGD) in men and women. AGD has been used as an indicator of fetal androgen dysfunction and an adverse outcome in adulthood. Some studies have shown the association of AGD as a predictor in the diagnosis and prognosis of diseases and disorders. Objective: To systematically summarize the latest evidence for presenting AGD as a new approach for prognosis and early diagnosis of diseases. Materials and Methods: A systematic review of the available literature was performed using Medline via PubMed, Scopus, and ISI Web of Knowledge up to July 2021, using search terms “anogenital distance” OR “anogenital index” OR “ano genital distance” OR “ano genital index”. Language restrictions were not imposed. Results: After reviewing the retrieved articles, 47 unique studies were included in this systematic review. Different outcomes, including endometriosis, prostate cancer, polycystic ovary syndrome, pelvic organ prolapse, hypospadias, cryptorchidism, fertility and semen parameters, maternal and birth development, and ovarian and gynecological-related disorders, have been studied in the included evidence. A negative association was observed between AGD and endometriosis and hypospadias and a positive association between AGD and prostate cancer, polycystic ovary syndrome, male fetal gender, and fertility parameters. Conclusion: Using quantitative indicators such as AGD may be a useful clinical tool for diagnosing diseases. Although many studies have shown an association between AGD and diseases, some factors, including different measurement methods, different measurement tools, age, and different definitions of AGD, can be involved in the variation of AGD. Key words: Genitalia, Prognosis, Early diagnosis, Reproductive health

    Linguistic aggregation methods in blog retrieval

    Get PDF
    This paper addresses the blog distillation problem, that is, given a user query find the blogs that are most related to the query topic. We model each post as evidence of the relevance of a blog to the query, and use aggregation methods like Ordered Weighted Averaging (OWA) operators to combine the evidence. We show that using only highly relevant evidence (posts) for each blog can result in an effective retrieval system. We also take into account the importance of the posts in a query-based cluster and investigate its effect in the aggregation results. We use prioritized OWA operators and show that considering the importance is effective when the number of aggregated posts from each blog is high. We carry out our experiments on three different data sets (TREC07, TREC08 and TREC09) and show statistically significant improvements over state of the art model called Voting Model

    Employing document dependency in blog search

    Get PDF
    The goal in blog search is to rank blogs according to their recurrent relevance to the topic of the query. State-of-the-art approaches view it as an expert search or resource selection problem. We investigate the effect of content-based similarity between posts on the performance of the retrieval system. We test two different approaches for smoothing (regularizing) relevance scores of posts based on their dependencies. In the first approach, we smooth term distributions describing posts by performing a random walk over a document-term graph in which similar posts are highly connected. In the second, we directly smooth scores for posts using a regularization framework that aims to minimize the discrepancy between scores for similar documents. We then extend these approaches to consider the time interval between the posts in smoothing the scores. The idea is that if two posts are temporally close, then they are good sources for smoothing each other's relevance scores. We compare these methods with the state-of the-art approaches in blog search that employ Language Modeling-based resource selection algorithms and fusion-based methods for aggregating post relevance scores. We show performance gains over the baseline techniques which do not take advantage of the relation between posts for smoothing relevance estimates

    مشروعیت وثیقه گذاری اعضای بدن انسان زنده: رهن اعضای بدن

    No full text
    Background and Aim: Man's relationship with his "body parts" is one of the topics that today is very important in solving some emerging issues such as mortgage of body parts. Determining the type of relationship and the scope of this important issue, by finding the jurisprudential-legal roots as well as paying attention to the tax on the organs of the body during life, are among the issues that should be re-examined. Today, due to poor economic conditions and poor living conditions, especially homeless families, it has caused few people to leave them as a guarantee and guarantee for them in various matters, or considering that most criminals are imprisoned as an example of debtors, They have economic and livelihood weakness, and it is not possible to provide them with ransom or their wisdom, and on the other hand, the rightful claimant also has his rights. Mortgage, so that he can provide blood money by working in the community. In case of non-procurement, according to the rule "Permission is also included in its equipment", the right holder will have the right to donate and sell non-members by legal request and through judicial procedure and introduction to health care institutions. Head his body to pay.زمینه و هدف: ارتباط انسان با «اعضای بدن» خویش، امروزه در حل برخی موضوعات مستحدثه مانند رهن اعضای غیر رئيسه بدن بسیار مورد توجه قرار می­گیرد. تعیین نوع رابطه و قلمرو این مهم، با ریشه­یابی فقهی حقوقی و همچنین توجه به مالیت اعضای بدن در حین حیات از جمله مباحثی است که باید مورد بازپژوهی قرار گیرد. امروزه شرایط نابسامان اقتصادی و شرایط بد معیشتی به ویژه در خانواده­های بی‏سرپرست باعث شده است که کمتر کسی به عنوان وثاقت و ضمانت برای آنان در امور مختلف پا پیش گذارد و یا با توجه به اینکه غالب مجرمین به عنوان مصداقی از مدیونین در بند، از ضعف اقتصادی و معیشتی برخوردارند و امکان تدارک دیه برای آنها یا عاقله آنان وجود ندارد و از طرفی ذی حق نیز خواهان حقوق خود است، فرد مدیون می­تواند برخی از اعضای بدن(غیررئيسه) خود را، به عنوان سرمایه­ای که اکنون مالیت دارد در رهن قرار دهد، تا بتواند دیه را با اشتغال در جامعه فراهم سازد.  حال بررسی این مهم است که در صورت عدم تدارک، طبق قاعده« اذن درشی اذن در لوازم آن نیز است» این اختیار برای ذی حق وجود خواهد داشت، تا با درخواست قانونی و طی طریق رویه قضایی و معرفی به موسسات بهداشت و درمان، به اهدا و فروش اعضای غیر رئیسه بدن او بپردازد.یا خیر؟ مواد و روش‌ها: پژوهش پیش‏رو با روش توصیفی تحلیلی و با  استناد به منابع کتابخانه‏ای پس از بررسی مالیت داشتن اعضای بدن به بررسی امکان­سنجی رهن اعضای بدن با توجه به ارکان عقد رهن می‏پردازد. نتیجه‌گیری: با عنایت به نظریه مشهور فقها، با قواعد و ادله­ای از جمله، «قاعده تسلیط»، «اصل اباحه»، «ابقای عین با استیفای منافع»، «اصل رضائی بودن توافقات»، «منافع عقلایی بر وجه مصلحت»، و... مشروعیت وثیقه­گذاری اعضا بدن محرز می­گردد. &nbsp

    Building queries for prior-art search

    Get PDF
    Prior-art search is a critical step in the examination procedure of a patent application. This study explores automatic query generation from patent documents to facilitate the time-consuming and labor-intensive search for relevant patents. It is essential for this task to identify discriminative terms in different fields of a query patent, which enables us to distinguish relevant patents from non-relevant patents. To this end we investigate the distribution of terms occurring in different fields of the query patent and compare the distributions with the rest of the collection using language modeling estimation techniques. We experiment with term weighting based on the Kullback-Leibler divergence between the query patent and the collection and also with parsimonious language model estimation. Both of these techniques promote words that are common in the query patent and are rare in the collection. We also incorporate the classification assigned to patent documents into our model, to exploit available human judgements in the form of a hierarchical classification. Experimental results show that the retrieval using the generated queries is effective, particularly in terms of recall, while patent description is shown to be the most useful source for extracting query terms

    Rich document representation and classification: an analysis

    No full text

    Blog Distillation using Random Walks

    No full text
    This paper addresses the blog distillation problem. That is, given a user query find the blogs most related to the query topic. We model the blogosphere as a single graph that includes extra information besides the content of the posts. By performing a random walk on this graph we extract most relevant blogs for each query. Our experiments on the TREC’07 data set show 15 % improvement in MAP and 8 % improvement in Precision@10 over the Language Modeling baseline

    Evaluating answer passages using summarization measures

    No full text
    Passage-based retrieval models have been studied for some time and have been shown to have some benefits for doc-ument ranking. Finding passages that are not only topi-cally relevant, but are also answers to the users ’ questions would have a significant impact in applications such as mo-bile search. To develop models for answer passage retrieval, we need to have appropriate test collections and evaluation measures. Making annotations at the passage level is, how-ever, expensive and can have poor coverage. In this pa-per, we describe the advantages of document summarization measures for evaluating answer passage retrieval and show that these measures have high correlation with existing mea-sures and human judgments. 1

    Automatic refinement of patent queries using concept importance predictors

    No full text
    Patent prior art queries are full patent applications which are much longer than standard web search topics. Such queries are composed of hundreds of terms and do not rep-resent a focused information need. One way to make the queries more focused is to select a group of key terms as representatives. Existing works show that such a selection to reduce patent queries is a challenging task mainly because of the presence of ambiguous terms. Given this setup, we present a query modeling approach where we utilize patent-specific characteristics to generate more precise queries. We propose to automatically disambiguate query terms by em-ploying noun phrases that are extracted using the global analysis of the patent collection. We further introduce a method for predicting whether expansion using noun phrases would improve the retrieval effectiveness. Our experiments show that we can obtain almost 20% improvement by performing query expansion using the true importance of the noun phrase queries. Based on this ob-servation, we introduce various features that can be used to estimate the importance of the noun phrase query. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our ex-perimental results indicate that the proposed features make good predictors of the noun phrase importance, and selec-tive application of noun phrase queries using the importance predictors outperforms existing query generation methods
    corecore